[AISOS-741] Add auto review primitive to Forge by JoshSalomon · Pull Request #119 · forge-sdlc/forge

JoshSalomon · 2026-07-02T09:53:23Z

Summary

This PR implements the Auto-Review Primitive, a new capability that enables automated quality gates for skill execution in Forge. When a skill includes a review.md file, the system automatically spawns a reviewer agent after execution to validate output quality, providing structured feedback and retry loops until the work is approved or max retries are exhausted. This feature reduces manual review burden and catches quality issues earlier in the development cycle.

Changes

Container Review Loop Engine (`containers/review.py`)

Created Verdict enum (APPROVED/REJECTED) and ReviewConfig/ReviewCycleData dataclasses
Implemented parse_review_config() for parsing YAML frontmatter from review.md files
Implemented detect_review_md() with project override precedence (mirrors skill resolver pattern)
Implemented parse_verdict() for case-insensitive APPROVED/REJECTED marker extraction
Implemented write_cycle_file() for JSON output to .forge/{step-name}/review_cycle_N.json

Container Entrypoint Integration (`containers/entrypoint.py`)

Added run_review_loop() orchestrating the reviewer agent spawn, verdict parsing, and retry logic
Added run_reviewer_agent() and run_worker_with_feedback() for feedback injection
Added load_conversation_history() for loading prior context on retries
Integrated terminal progress display with TTY detection (_print_review_progress())

Observability Infrastructure (`src/forge/observability/`)

Created ReviewCyclePoller class with async polling loop for detecting review cycle files during container execution
Created ReviewCycleRecorder with mode switching (log/copy/None) for recording detected cycles
Created ReviewJiraNotifier for posting formatted Jira comments with rate limiting
Added Prometheus metrics: forge_review_cycles_total, forge_review_verdicts_total, forge_review_duration_seconds

Container Runner Integration (`src/forge/sandbox/runner.py`)

Extended ContainerResult dataclass with review_cycles field
Added step_name parameter to ContainerRunner.run() method
Implemented _poll_review_cycles() background task for async polling during execution
Implemented _sweep_review_cycles() for post-execution sync sweep to catch missed files

Workflow Node Updates

Updated all ContainerRunner-invoking nodes to pass step_name parameter:
- implementation.py: step_name='implement_task'
- local_reviewer.py: step_name='local_review'
- ci_evaluator.py: step_name='analyze_ci'/'fix_ci'
- code_review.py: step_name='code_review'
- docs_updater.py: step_name='update_docs'
- implement_review.py: step_name='implement_review_analyze'/'implement_review_fix'
- plan_bug_fix.py: step_name='plan_bug_fix'
- rca_analysis.py: step_name='analyze_bug'/'reflect_rca'
- rebase.py: step_name='rebase'

Configuration (`src/forge/config.py`)

Added AUTO_REVIEW_MAX_RETRIES: default max retry attempts (default: 3)
Added AUTO_REVIEW_POLL_INTERVAL: polling interval in seconds (default: 5)
Added AUTO_REVIEW_RECORD_POLLED_FILES: recording mode ('log'/'copy'/None)

Sample Review Files

Created skills/default/implement-task/review.md with generic code review criteria
Created skills/default/local-code-review/review.md with pre-PR review criteria
Created skills/aisos/implement-task/review.md as project-specific override example

Documentation

Created docs/reference/review-md-schema.md with formal schema specification
Created docs/guide/auto-review.md with comprehensive user documentation
Updated docs/skills/authoring.md, docs/skills/index.md, docs/skills/defaults.md to reference auto-review
Updated docs/reference/config.md with new configuration settings

Dependencies

Added aiofiles>=24.1.0 to pyproject.toml for async file operations

Implementation Notes

Precedence Logic: detect_review_md() mirrors the skill resolver pattern—project-specific overrides (skills/{project}/{skill}/review.md) take precedence over defaults (skills/default/{skill}/review.md)
Verdict Parsing: Uses position-based precedence when both APPROVED and REJECTED markers are present (first occurrence wins)
Async + Sync Polling: The runner uses async polling during execution plus a synchronous post-execution sweep to ensure no files are missed on fast container exits
Rate Limiting: Jira notifier includes configurable rate limiting (default 30s) to prevent comment spam
TTY Detection: Terminal progress output only displays when stdout is a TTY to avoid polluting logs in CI environments
Graceful Degradation: Malformed YAML in review.md logs a warning and falls back to defaults (per BR-006)

Testing

Added 200+ unit tests across new modules:
- tests/unit/containers/test_review.py: 57 tests for review.py functions
- tests/unit/observability/test_review_poller.py: 32 tests for async polling
- tests/unit/observability/test_review_recorder.py: 34 tests for recording modes
- tests/unit/observability/test_review_notifier.py: 31 tests for Jira notification
- tests/unit/sandbox/test_runner_review_polling.py: 25+ tests for runner integration
All 630+ existing unit tests continue to pass
Manual verification of terminal progress display format

Related Tickets

Generated by Forge SDLC Orchestrator

… ReviewCycleData dataclasses Detailed description: - Created containers/review.py with: - Verdict StrEnum with APPROVED and REJECTED values - ReviewConfig dataclass with max_retries (int, default 3) and instructions (str) - ReviewCycleData dataclass with all specified fields - parse_review_config() function for parsing YAML frontmatter - Function supports AUTO_REVIEW_MAX_RETRIES env var fallback (SC-005) - Malformed YAML logs warning and returns defaults (SC-012/BR-006) - Added 23 unit tests in tests/unit/containers/test_review.py Closes: AISOS-2053

Implemented detect_review_md(skill_name, ticket_key, skills_base) -> Path | None that locates review.md files with project override precedence: - Mirrors precedence logic from src/forge/skills/resolver.py - Checks project-specific path first: skills/{project}/{skill_name}/review.md - Falls back to default: skills/default/{skill_name}/review.md - Extracts project key from ticket_key (e.g., 'AISOS-123' -> 'aisos') - Returns None if review.md doesn't exist (SC-010) - Uses is_file() to correctly handle edge case of directory named review.md Key files modified: - containers/review.py: Added detect_review_md function with debug logging - tests/unit/containers/test_review.py: Added 11 comprehensive unit tests - tests/unit/containers/conftest.py: Created to isolate from forge imports Supports: - SC-006: Project override precedence - SC-010: No reviewer when review.md absent Closes: AISOS-2054

Added parse_verdict(output_text: str) -> tuple[Verdict, str] to containers/review.py: - Performs case-insensitive search for APPROVED and REJECTED markers - Returns (Verdict.APPROVED, "") when APPROVED marker found first (SC-002) - Returns (Verdict.REJECTED, feedback) when REJECTED marker found first, extracting all text after the marker as feedback (SC-002) - Returns (Verdict.REJECTED, "Verdict could not be parsed") when neither marker present - When both markers present, first occurrence wins (position-based) - Feedback text is stripped of leading/trailing whitespace - Empty feedback handled correctly (SC-003 edge case) Added 23 comprehensive unit tests covering: - Case-insensitive matching (uppercase, lowercase, mixed case) - Feedback extraction from text after REJECTED marker - Edge cases: empty string, whitespace-only, neither marker - Both markers present with position-based precedence - Real-world review scenarios All 57 tests pass. Closes: AISOS-2055

Added write_cycle_file(workspace, step_name, cycle_data) to containers/review.py: - Creates .forge/{step_name}/ directory if it doesn't exist - Writes review_cycle_N.json where N is cycle_data.cycle - Serializes ReviewCycleData to JSON with 2-space indentation - Ensures verdict is lowercase string ('approved' or 'rejected') - Handles both Verdict enum and string values for verdict field - Uses ensure_ascii=False for proper Unicode handling Added 18 unit tests in TestWriteCycleFile class verifying: - Directory creation (.forge/ and step subdirectory) - File output path pattern (review_cycle_N.json) - All required JSON fields (SC-007) - ISO 8601 UTC timestamp format preservation - Verdict lowercase conversion and enum handling - Multiple cycles written separately - JSON pretty printing - File overwrite behavior - Empty/multiline/special character feedback handling - Float precision for elapsed_seconds - Step name variations (hyphens, underscores) Closes: AISOS-2056

Integrates the review loop into the container entrypoint after skill execution: - Import review functions from containers/review.py (detect_review_md, parse_review_config, parse_verdict, write_cycle_file, ReviewCycleData, Verdict) - Add run_reviewer_agent() to spawn reviewer with review.md instructions (SC-001) - Add load_conversation_history() to load from .forge/history/{task_key}.json - Add run_worker_with_feedback() for retry with feedback injection (SC-003): - Injects '## Reviewer Feedback' section into task description - Loads conversation history for context - Add run_review_loop() for complete review loop: - Parse config via parse_review_config (SC-004, SC-005) - Track cycle timing with time.perf_counter() - Parse verdict via parse_verdict (SC-002) - Write cycle file to .forge/{step_name}/review_cycle_N.json (SC-007) - Exit on APPROVED or max retries exhausted (BR-005) - Modify main() to: - Extract skill_name from task_data or FORGE_SKILL_NAME env var - Check for review.md via detect_review_md after run_agent_task - Run review loop if review.md exists (SC-001, SC-010) Add 17 unit tests covering all acceptance criteria: - load_conversation_history: loads existing, handles missing, handles JSON error - run_worker_with_feedback: feedback injection, history loading - run_review_loop: APPROVED exits, REJECTED retries, max retries, frontmatter, env var fallback, cycle file path, reviewer instructions, timing, error handling - main integration: skips when no skill_name, skips when no review.md, runs when exists Closes: AISOS-2057

… mode Add terminal progress output to the review loop in containers/entrypoint.py: - Added TTY detection with _IS_TTY = sys.stdout.isatty() - Added _print_review_progress() function for terminal output: - Format: "Review cycle N/M: VERDICT - [truncated feedback]" - Only outputs when stdout is a TTY (local/terminal mode) - Truncates feedback to 200 chars with ellipsis when exceeded - Uses flush=True for immediate output (SC-011: within 1 second) - Integrated into run_review_loop() immediately after verdict determination - Added 8 unit tests for the new functionality Supports SC-011: terminal displays progress within 1 second of verdict. Closes: AISOS-2058

Implements async polling for review cycle files during container execution. Changes: - Create ReviewCyclePoller class in src/forge/observability/review_poller.py - Accepts step_name parameter for step-specific path detection - Polls .forge/{step-name}/review_cycle_*.json files - Implements async polling loop with configurable interval - JSON parsing with retry handles race conditions with container writes - Tracks already-processed files to avoid duplicates - Add ReviewCycleData dataclass for orchestrator-side review cycle data - Add AUTO_REVIEW_POLL_INTERVAL (default 5s) to forge.config.Settings - Add aiofiles>=24.1.0 dependency to pyproject.toml - Export ReviewCyclePoller and ReviewCycleData from forge.observability - Add 32 unit tests with full coverage Closes: AISOS-2059

…ode switching Implemented review cycle recording functionality for observability: - Created src/forge/observability/review_recorder.py with: - ReviewCycleData dataclass with fields: cycle, max_cycles, verdict, feedback, skill, elapsed_seconds, timestamp (datetime type) - to_dict() and to_json() methods for serialization - ReviewCycleRecorder class with step_name parameter and mode switching: - mode='log': logs cycle data at INFO level with structured format - mode='copy': copies files to {recording_dir}/{step-name}/review_cycle_*.json - mode=None: recording disabled - step_dir property for step-specific subdirectory path - record() method for logging, record_file() method for copying - Added AUTO_REVIEW_RECORD_POLLED_FILES setting to forge.config.Settings - Type: Literal['log', 'copy'] | None (default: None) - Placed under Auto Review Configuration section - Updated observability module __init__.py to export new classes - Created comprehensive unit tests (34 tests) in tests/unit/observability/test_review_recorder.py covering: - ReviewCycleData fields, to_dict(), to_json() - ReviewCycleRecorder initialization with all mode combinations - Log mode INFO level logging with structured format - Copy mode file copying and directory creation - Error handling for missing files and directory creation failures - Settings configuration validation All tests pass. Closes: AISOS-2060

Extend src/forge/api/routes/metrics.py with Prometheus metrics for review cycle monitoring: - forge_review_cycles_total Counter: Total review cycles detected, labeled by skill and step - forge_review_verdicts_total Counter: Review verdicts by outcome (approved/rejected), labeled by skill, step, and verdict - forge_review_duration_seconds Histogram: Review cycle duration, labeled by skill and step, using same buckets as AGENT_DURATION [1, 5, 10, 30, 60, 120, 300, 600] Added helper functions for recording metrics: - record_review_cycle(skill, step) - record_review_verdict(skill, step, verdict) - observe_review_duration(skill, step, duration) Added 11 unit tests in TestReviewCycleMetrics class verifying: - Counter and histogram existence and labeling - Helper function increment/observe behavior - Metrics visibility in /metrics endpoint output Closes: AISOS-2061

Create src/forge/observability/review_notifier.py module for posting Jira comments when review cycles are detected during container execution. Changes: - Add ReviewJiraNotifier class with rate limiting support (default 30s) - Format ReviewCycleData as readable Jira comments with: - Cycle number and max cycles with header - Verdict with ✅/❌ icons (APPROVED/REJECTED) - Skill name and duration (seconds or minutes) - Feedback summary (truncated at 500 chars with word boundary) - Handle Jira API errors gracefully: log WARNING and continue - Add NotifyResult dataclass for notify operation results - Export ReviewJiraNotifier and NotifyResult from forge.observability - Add 31 comprehensive unit tests covering: - Initialization and defaults - Rate limiting behavior - Comment formatting - Error handling - Integration scenarios Follows existing JiraClient patterns from forge.workflow.utils.jira_status Closes: AISOS-2062

…name parameter Implementation: - Extended ContainerResult dataclass with review_cycles field (list[ReviewCycleData]) - Added optional step_name parameter to ContainerRunner.run() method - Added _poll_review_cycles() background task that: - Polls workspace for .forge/{step-name}/review_cycle_*.json files - Records cycles via ReviewCycleRecorder (log mode) - Emits Prometheus metrics for observability - Added _poller_to_recorder_cycle() helper for ReviewCycleData conversion - Background polling task cancelled cleanly on container exit - Final poll performed after container exits to catch remaining files - Collected review_cycles included in all ContainerResult returns Unit tests: - 14 tests in tests/unit/sandbox/test_runner_review_polling.py covering: - ContainerResult review_cycles field - Poller to recorder cycle conversion - Step name parameter acceptance - Background polling task lifecycle - Review cycle collection and metrics recording - Step name path organization Closes: AISOS-2063

Added _sweep_review_cycles() private method to ContainerRunner that performs a synchronous post-execution sweep after the container exits. This catches any review cycle files that may have been missed during async polling, especially when containers exit quickly after writing files. Implementation details: - Scans .forge/{step-name}/review_cycle_*.json for all files - Deduplicates against files already processed by async poller - Parses and validates JSON, adding to review_cycles list - Records via recorder and emits Prometheus metrics - Logs warning when files are missed during async polling - Handles edge cases: invalid JSON, missing fields, empty files Added 11 unit tests covering: - Basic sweep functionality and deduplication - Edge case handling (missing dir, invalid JSON, etc.) - Integration tests for fast-exit scenario - Warning logging verification Closes: AISOS-2064

Updated all workflow nodes that invoke ContainerRunner to pass the step_name parameter for proper review cycle file organization: - implementation.py: step_name='implement_task' - local_reviewer.py: step_name='local_review' (bug and feature reviews) - ci_evaluator.py: step_name='analyze_ci' and step_name='fix_ci' - code_review.py: step_name='code_review' - docs_updater.py: step_name='update_docs' - implement_review.py: step_name='implement_review_analyze' and 'implement_review_fix' - plan_bug_fix.py: step_name='plan_bug_fix' - rca_analysis.py: step_name='analyze_bug' and step_name='reflect_rca' - rebase.py: step_name='rebase' This enables the orchestrator to poll for and record files at .forge/{step-name}/review_cycle_*.json during container execution. Added unit tests for step_name propagation in both implementation.py and local_reviewer.py. All 630+ unit tests pass. Closes: AISOS-2065

…fication Created docs/reference/review-md-schema.md with complete schema specification: - Overview explaining purpose of review.md files - File location and naming convention (review.md alongside SKILL.md) - Per-project override precedence rules (project > default) - YAML frontmatter schema with max_retries field (optional, default: 3) - Priority chain: frontmatter > AUTO_REVIEW_MAX_RETRIES env var > default - Prose body format for reviewer instructions with best practices - Edge cases: empty file, frontmatter-only, missing frontmatter, malformed YAML - Review cycle output path: .forge/{step-name}/review_cycle_N.json - JSON schema for review cycle output files - Validation summary table Follows existing documentation style from docs/reference/ files and is consistent with SKILL.md frontmatter patterns. Closes: AISOS-2066

Add default review instructions for code implementation tasks with: - YAML frontmatter with max_retries: 3 - Four review criteria: test coverage, error handling, documentation, no debug code - Language-agnostic examples (Python, Go, Node.js, Rust) - Instructions to use git diff origin/main...HEAD to see changes - APPROVED/REJECTED verdict markers - Structured feedback format for rejections with file/line/criterion Closes: AISOS-2067

Add default review instructions for pre-PR code review quality gate. The review.md complements the existing SKILL.md by adding an automated quality gate that runs after the local-code-review skill completes. Key features: - YAML frontmatter with max_retries: 2 (lower than implement-task's 3 since this is a secondary review pass) - Instructions to detect and run project test command (pyproject.toml, package.json, Makefile, Cargo.toml, go.mod) - Four review criteria: 1. Breaking changes: API/interface changes without migration 2. Test failures: all tests must pass locally 3. Security issues: hardcoded secrets, SQL injection, path traversal 4. Spec alignment: changes match task description - APPROVED/REJECTED verdict format with structured feedback examples Closes: AISOS-2068

…plement-task Created skills/aisos/implement-task/review.md as a reference example demonstrating per-project skill customization. The override: - Sets max_retries: 2 (customized from default of 3) - Demonstrates inheritance by referencing default review criteria - Adds Forge-specific criteria: - Testing with pytest (fixtures, parametrize, pytest-asyncio) - Type hints required on all public functions (PEP 604 style) - Logging via logging module (no print() for diagnostics) - Proper async/await patterns (gather vs sequential, no blocking I/O) - Includes documentation for skill authors on how to create overrides - Shows feedback format examples with Forge-specific file paths This serves as documentation by example for teams creating their own project-specific skill overrides. Closes: AISOS-2069

Detailed description: - Created docs/guide/auto-review.md documenting the auto-review primitive - Covers all required sections: - Overview: What auto-review is and when to use it - Quick Start: Minimal review.md example - Configuration: max_retries in frontmatter, AUTO_REVIEW_MAX_RETRIES env var, default (3) - Writing Review Instructions: How to write effective reviewer prompts - Verdict Protocol: APPROVED/REJECTED markers with examples and feedback format - Per-Project Overrides: How skill resolution works for review.md - Observability: Review cycle files at .forge/{step-name}/review_cycle_N.json - Troubleshooting: Timeout, parsing failures, endless loops, and more - Includes mermaid flowchart showing review loop - Links to related docs (skills authoring, review-md-schema) - Follows documentation style from existing docs/guide/ files Closes: AISOS-2070

Updated three documentation files to reference the new auto-review capability: - docs/skills/authoring.md: Added Auto-Review section explaining that skills can include review.md for automatic quality gates, with link to the guide - docs/skills/index.md: Updated directory layout to show review.md alongside SKILL.md, added Optional Files table with review.md description - docs/skills/defaults.md: Added Auto-review notes under implement-task and local-code-review sections describing what their review.md files check All changes link to docs/guide/auto-review.md for full details. Changes are minimal and focused — the full documentation is in the auto-review guide. Closes: AISOS-2071

Fixed ruff I001 lint error in src/forge/observability/__init__.py: - Changed multi-line import blocks to single-line imports - Imports now properly sorted alphabetically by module name All tests pass, no breaking issues found in code review. Closes: AISOS-741-review

…ig reference Added new Auto-Review section to docs/reference/config.md documenting: - AUTO_REVIEW_MAX_RETRIES: default max retry attempts - AUTO_REVIEW_POLL_INTERVAL: polling interval for review cycle files - AUTO_REVIEW_RECORD_POLLED_FILES: recording mode for observability These settings were added to src/forge/config.py as part of the auto-review feature but were not reflected in the configuration reference documentation. Closes: AISOS-741-docs

The entrypoint imports from review.py but the Containerfile only copied entrypoint.py. This caused ImportError at container startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The test_review.py tests import from the containers/ directory which isn't a Python package. Added sys.path setup in the test conftest following the existing pattern used by integration tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

poll() was declared async but only did 'return self'. As an async def, this wraps self in a coroutine, breaking 'async for' on Python 3.12 (CI) which requires __aiter__ on the direct return value. Making it a regular method returns self directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

poll() is correctly async (returns a coroutine wrapping self). The caller in runner.py must await it to get the async iterator. The previous fix incorrectly made poll() sync, which broke the poller tests on Python 3.12. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The poll() method is async, so the mock must be AsyncMock for await to work on Python 3.12. MagicMock returns a non-awaitable which fails with "can't be used in 'await' expression". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

poll() only sets a flag and returns self — no I/O, no reason to be async. As async def, it wraps self in a coroutine which breaks 'async for' on Python 3.11 (CI) where coroutines don't satisfy __aiter__. Making it sync aligns with how runner.py calls it: 'async for new_cycles in poller.poll():' Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Forge added 21 commits July 2, 2026 12:37

JoshSalomon commented Jul 2, 2026

View reviewed changes

Comment thread containers/entrypoint.py

JoshSalomon and others added 6 commits July 2, 2026 13:24

fix: copy review.py module into container image

a038452

The entrypoint imports from review.py but the Containerfile only copied entrypoint.py. This caused ImportError at container startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AISOS-741] Add auto review primitive to Forge#119

[AISOS-741] Add auto review primitive to Forge#119
JoshSalomon wants to merge 27 commits into
forge-sdlc:mainfrom
JoshSalomon:forge/aisos-741

JoshSalomon commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JoshSalomon commented Jul 2, 2026

Summary

Changes

Container Review Loop Engine (containers/review.py)

Container Entrypoint Integration (containers/entrypoint.py)

Observability Infrastructure (src/forge/observability/)

Container Runner Integration (src/forge/sandbox/runner.py)

Workflow Node Updates

Configuration (src/forge/config.py)

Sample Review Files

Documentation

Dependencies

Implementation Notes

Testing

Related Tickets

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Container Review Loop Engine (`containers/review.py`)

Container Entrypoint Integration (`containers/entrypoint.py`)

Observability Infrastructure (`src/forge/observability/`)

Container Runner Integration (`src/forge/sandbox/runner.py`)

Configuration (`src/forge/config.py`)